Data augmentation and enhancement for multimodal speech emotion recognition
نویسندگان
چکیده
Humans’ fundamental need is interaction with each other such as using conversation or speech. Therefore, it crucial to analyze speech computer technology determine emotions. The emotion recognition (SER) method detects emotions in by examining various aspects. SER a supervised decide the class This research proposed multimodal model one of deep learning based enhancement techniques, which attention mechanism. Additionally, this addresses imbalanced dataset problem field generative adversarial networks (GAN) data augmentation technique. achieved an excellent evaluation performance 0.96 96% for GAN configuration. work showed that could enhance and create balanced dataset.
منابع مشابه
Multimodal emotion recognition from expressive faces, body gestures and speech
In this paper we present a multimodal approach for the recognition of eight emotions that integrates information from facial expressions, body movement and gestures and speech. We trained and tested a model with a Bayesian classifier, using a multimodal corpus with eight emotions and ten subjects. First individual classifiers were trained for each modality. Then data were fused at the feature l...
متن کاملSpeech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment
In this work, we design a neural network for recognizing emotions in speech, using the standard IEMOCAP dataset. Following the latest advances in audio analysis, we use an architecture involving both convolutional layers, for extracting highlevel features from raw spectrograms, and recurrent ones for aggregating long-term dependencies. Applying techniques of data augmentation, layerwise learnin...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملMultimodal Emotion Recognition Integrating Affective Speech with Facial Expression
In recent years, emotion recognition has attracted extensive interest in signal processing, artificial intelligence and pattern recognition due to its potential applications to human-computer-interaction (HCI). Most previously published works in the field of emotion recognition devote to performing emotion recognition by using either affective speech or facial expression. However, Affective spe...
متن کاملMultimodal Emotion Recognition
Multimodal fusion is the process whereby two or more forms of input are gathered together in order to produce a higher overall classification accuracy than individual unimodal systems. This is a popular technique in emotion recognition. In this study, we attempted to discover how much we could improve upon individual unimodal systems using decision level fusion. To accomplish this, we acquired ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bulletin of Electrical Engineering and Informatics
سال: 2023
ISSN: ['2302-9285']
DOI: https://doi.org/10.11591/eei.v12i5.5031